-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1] Refactor representation of set ops #1538
Conversation
CROSS-ENGINE Conformance Report ❌
Testing DetailsResult Details
Now Failing Tests ❌The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. Now Passing Tests492 test(s) were previously failing in BASE (LEGACY-FAB43BF) but now pass in TARGET (EVAL-FAB43BF). Before merging, confirm they are intended to pass. The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact. CROSS-COMMIT-LEGACY Conformance Report ❌
Testing DetailsResult Details
Now Failing Tests ❌The following 5 test(s) were previously PASSING in BASE but are now FAILING in TARGET: Click here to see
CROSS-COMMIT-EVAL Conformance Report ✅
Testing DetailsResult Details
Now Passing TestsThe following 6 test(s) were previously FAILING in BASE but are now PASSING in TARGET. Before merging, confirm they are intended to pass: Click here to see
|
partiql-eval/src/main/kotlin/org/partiql/eval/internal/helpers/ToBag.kt
Outdated
Show resolved
Hide resolved
fb73da5
to
ca557f6
Compare
type: '.set_op', | ||
operand: '.expr.s_f_w', | ||
}, | ||
query_set::{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) primary addition of this PR is adding query_set
and query_expr
to the AST. This modeling allows us to define ORDER BY
, LIMIT
, and OFFSET
on SFW queries as well as SQL set ops.
Modeling is partially based off
- partiql-lang-rust -- https://github.com/partiql/partiql-lang-rust/blob/e5b8ce7497f515bc998a31cd8b45fb2f3e9287a2/partiql-ast/src/ast.rs#L238-L244
- similar modeling in the sqlparser Rust crate -- https://docs.rs/sqlparser/0.48.0/sqlparser/ast/struct.Query.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Would we want to add the naming followed by PLR and SQLParser? AKA Query
is the top-level product type holding the expression, order-by, etc. And the union types QuerySet
(PLR) or SetExpr
(SQLParser) holding the different variants (expr, select, wrapped Query, values, etc).
Especially since the Java name will now be: Expr.QuerySet
and QueryExpr
. I found it a bit confusing in the rest of the code. Could just be: Expr.Query
and Expr.Query.Body
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more context on PLR's AST definition in the partiql-tests PR partiql/partiql-tests#121 (comment). The AST definition I used in this PR allows for us to be more strict in what expressions we want to support with LIMIT
, OFFSET
, and ORDER BY
. For now, I'm thinking we only permit those clauses on
- SFW
- SQL set ops
- PartiQL bag ops
PLR also allows those clauses on general expressions. Since it's not in the PartiQL spec, I haven't chosen to allow those clauses on general expressions.
I agree that the previous naming was confusing, so updated to use query_body
rather than query_expr
.
@@ -320,7 +320,14 @@ class QueryPrettyPrinter { | |||
is PartiqlAst.Expr.Or -> writeNAryOperator("OR", node.operands, sb, level) | |||
is PartiqlAst.Expr.InCollection -> writeNAryOperator("IN", node.operands, sb, level) | |||
is PartiqlAst.Expr.BagOp -> { | |||
var name = node.op.javaClass.simpleName.toUpperCase().replace("_", " ") | |||
var name = when (node.op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) changes to partiql-lang
tests follow what I did for a similar PR targeting v0.14.6
to fix some legacy tests -- #1506.
@@ -322,7 +342,7 @@ rel::{ | |||
_: [ | |||
call::{ | |||
agg: ref, | |||
set_quantifier: [ ALL, DISTINCT ], | |||
setq: set_quantifier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) moved set_quantifier
out of this nested aggregation plan node. Also renamed to setq
for consistency w/ the set ops.
test/partiql-tests
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) currently points to the source branch of partiql/partiql-tests#121. Once that partiql-tests
PR is merged in, I will update the submodule commit.
@OptIn(PartiQLValueExperimental::class) | ||
internal fun PartiQLValue.toBag(): BagValue<*> { | ||
TODO("For OUTER set operators") | ||
internal fun Datum.asIterator(coerce: Boolean): Iterator<Datum> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) coercion behavior is slightly different than other type check exceptions in permissive mode. For the bag operators, we coerce to a bag rather than return missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to go through the evaluation implementations for the bag ops -- but overall looks good. I've left some comments on the modelling.
order=orderByClause?? | ||
limit=limitClause?? | ||
offset=offsetByClause?? # SfwQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think. Can we remove these completely? Since the AST nodes don't allow for this, this may result in ambiguous results. I believe SQL would only allow this by using parenthesis -- which I believe would be handled even if removing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the AST nodes don't allow for this, this may result in ambiguous results.
Could you clarify on this? Are you referring to if a set op query is defined without parens and it includes a LIMIT
, OFFSET
, or ORDER BY
? For example,
SELECT * FROM a UNION SELECT * FROM b LIMIT 5
In this case, I believe the parsing behavior we want is to apply the LIMIT
to the UNION
set operator. This behavior follows what PostgreSQL's parsing does - https://www.postgresql.org/docs/current/queries-union.html.
type: '.set_op', | ||
operand: '.expr.s_f_w', | ||
}, | ||
query_set::{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Would we want to add the naming followed by PLR and SQLParser? AKA Query
is the top-level product type holding the expression, order-by, etc. And the union types QuerySet
(PLR) or SetExpr
(SQLParser) holding the different variants (expr, select, wrapped Query, values, etc).
Especially since the Java name will now be: Expr.QuerySet
and QueryExpr
. I found it a bit confusing in the rest of the code. Could just be: Expr.Query
and Expr.Query.Body
.
partiql-eval/src/main/kotlin/org/partiql/eval/internal/operator/rex/ExprUnionAll.kt
Outdated
Show resolved
Hide resolved
partiql-eval/src/main/kotlin/org/partiql/eval/internal/operator/rex/ExprUnionDistinct.kt
Outdated
Show resolved
Hide resolved
Thanks for the review @johnedquinn. Based on the offline discussion, we decided on the following changes
Will mark this PR as a draft till I address those concerns and the other PR comments. |
359f674
to
22ca247
Compare
val quantifier = when (setOp.type.setq) { | ||
SetQuantifier.ALL -> Rel.Op.Set.Quantifier.ALL | ||
null, SetQuantifier.DISTINCT -> Rel.Op.Set.Quantifier.DISTINCT | ||
private fun convertSetOp(setExpr: QueryBody.SetOp): Rel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self-review) one benefit of reusing the SQL set op plan nodes for the PartiQL bag ops is that we can reuse the Rel
scan behavior for non-collections (i.e. coercing to a bag in permissive and erroring in strict mode).
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v1 #1538 +/- ##
=====================================
Coverage ? 77.15%
Complexity ? 2514
=====================================
Files ? 254
Lines ? 18583
Branches ? 3523
=====================================
Hits ? 14337
Misses ? 3212
Partials ? 1034
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Once the tests PR is merged and the submodule commit is updated, I can approve.
22ca247
to
53e96c2
Compare
Since the last review,
|
fun Record.toDatumArrayCoerceMissing(): Array<Datum> = Array(this.values.size) { | ||
val d = [email protected][it] | ||
when (d.isMissing) { | ||
true -> Datum.nullValue() | ||
else -> d | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just clarifying why you're doing this: even though the comparator treats null/missing as equal, it isn't deterministic which value will be returned in the TreeSet -- so you need to coerce them ahead of time.
Also, I'm leaving a suggestion here because I don't think you need to create a new Array and move all previous pointers to the new one.
fun Record.toDatumArrayCoerceMissing(): Array<Datum> = Array(this.values.size) { | |
val d = this@toDatumArrayCoerceMissing.values[it] | |
when (d.isMissing) { | |
true -> Datum.nullValue() | |
else -> d | |
} | |
} | |
fun Array<Datum>.coerceMissing() { | |
for (i in 0 until this.size) { | |
if (this[i].isMissing) { | |
this[i] = Datum.nullValue() | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just clarifying why you're doing this: even though the comparator treats null/missing as equal, it isn't deterministic which value will be returned in the TreeSet -- so you need to coerce them ahead of time.
Yes, that's correct. I added a brief comment in the latest commit to give some more context on the behavior.
Also, I'm leaving a suggestion here because I don't think you need to create a new Array and move all previous pointers to the new one.
Thanks for the suggestion. You're right -- it can be simplified quite a bit. I applied your suggestion in the latest commit.
dc6e8f1
to
75f3f35
Compare
Relevant Issues
s_f_w.set_op
does not allow for left associative set operations #1507Description
ORDER BY
,LIMIT
,OFFSET
on the SQL set op and the SFW argumentsORDER BY
,LIMIT
,OFFSET
specified on SQL set opOther Information
Updated Unreleased Section in CHANGELOG: [NO]
Any backward-incompatible changes? [YES]
v1
branchAny new external dependencies? [NO]
Do your changes comply with the Contributing Guidelines
and Code Style Guidelines? [YES]
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.